Photo by CDC on Unsplash unsplash-logo

CDC

Disclaimer: The purpose of the Johns Hopkins IDD COVIDScenarioPipeline project is to provide tools for analysis of COVID-19 related data. These materials do not cover all aspects of the research process. We highly suggest that you seek external consultation from scientific experts regarding your data and the interpreation of your data.

This tutorial assumes that users have knowlege of R programming and limited command line experience. It does not require previous knowlege of GitHub. The tutorial however should be doable by someone without R programming or command line experience.

GitHub

If you already have a GitHub account, you can skip this step and move onto the Getting Started with the Pipeline section.

What is GitHub?

GitHub is a site that allows users to host and manage code and data files. Thus, you can store your code on the web so that you and others can easily access it (and so that is safe if something happens to your computer!).

It is especailly useful for what is called version control which allows you to track changes to documents overtime.

So although it is intended for version control of code, you can actually use GitHub for version control of many types of documents.

Why do I need an account?

By signing up for an account you can easily access up-to-date files and code for the COVIDScenarioPipeline to allow you to easily run the pipeline on your data.

Better yet, if you learn more about GitHub, you can also use your account to save the files and code for your analysis and track changes over time. You can share your analysis privately with just your team or you can even make it public for others to use.

To learn more about GitHub see here.

Create a GitHub Account:

  1. Click this link

You will see a page that looks something like this:

  1. Fill out a username (any name that works for you), email, and password
  2. Click the green “sign up for GitHub” button

Getting Started with the Pipeline

Making a pipeline repository

First navigate to the Johns Hopkins IDD COVIDScenarioPipeline github repository: by clicking here.

You will see a page that looks like this:

Click on the green button that says “Use this template” as shown in the above image.


This will take you to a new page that looks like this:

Here you will:

  1. Provide the name for your repository that you are about to create - “COVID_Pipeline” would work
  2. Decide if you want your repository to be Public or Private
  3. Press the green “Create repository from template” button

Great! Now you have a repository on GitHub which contains all the current COVIDScenarioPipeline files and code.

It should look something like this:

Leave this open! You will want this for the next step!


Get the pipeline files onto your computer

  1. Press the green “Clone or download” button in your github repository that you just created

  1. This will bring up a small window. Press the small botton with an icon that looks like a clipboard. This will copy the location of your repository on GitHub.

  1. Open a new project in RStudio (if this is new for you see New to R or RStudio)

  2. Select the Terminal tab in Rstudio

  1. Go to your type the following words (but do not press enter yet):

git clone

  1. Paste what is on your clipboard by either using keyboard shortcuts or edit –> paste in RStudio

Should look something like this after the dollar sign $:

git clone https://github.com/yourgithubusername/COVID_Pipeline.git

Where your github username is shown inbetween “github.com” and the name of the repository you created.

  1. Press enter

you should see some messages like:

Cloning into 'COVID_Pipeline'...

Once it is complete you will see that you now have a directory named the same as your GitHub repository that contains all the files in the repository.

Great now we have the files we need on our local computer!


Get the required R and Python tools onto your computer

To get the exact versions of the required R packages and Python packages, modules, and scripts, we can simply use something called Docker.

If you are new to Docker and need to set up an account go to the New to Docker section of the tutorial.

New to R or RStudio


Dowload and install R and RStudio

If you are new to R or RStudio, dont worry! You can follow these simple steps to get started.

You will need to download install RStudio (and possibly R if you do not already have it installed).

To do so follow this tutorial.

Create an RStudio project

  1. Go to File –> New Project

  1. Choose the directory for your covid project - likely you would want “New Directory”

  1. Select “New Project” as the Project Type Note: you may not see all of the same options as shown here

4) If you selected a new directory, than designate the name of that new directory and double check that it’s location is somewhere on your computer that you would want. Perhaps COVID_Pipeline would be a good name. We will use this in our examples.

Great! Now you are ready to start using RStudio for the COVIDScenarioPipeline. Return to the Getting Started with the Pipeline section of the tutorial.

New to Docker


What is Docker?

Why do I need an account?

Create a Docker Account:

  1. Click this link

You will see a page that looks something like this:
2) Fill out a Docker ID (any name that works for you), email, and password - click that you are not a robot
3) Click the blue “Sign Up” button
4) You will be taken to a new window - Select the free Community Docker Plan 5) Verify your account through your email
6) This will take you to a new window that looks like this:

Click on “Get started with Docker Desktop”

  1. This will take you to a window with an image like this which should have a button below for downloading Docker Desktop on your computer:

Note: This may take some time!


Summary